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1.  Introduction 

This  paper  presents  a  model,  variations  of  which  have  been  considered  by 
Anscombe  [1]  and  Colton  [4]  and  others,  -which  is  relevant  to  the  problem  of 
sequential  testing  in  clinical  trials.  This  model  is  the  same  as  one  discussed  by 
Chernoff  and  Ray  [3]  and  by  Wurtele  [7]  in  a  sampling  inspection  problem  and 
is  naturally  related  to  a  one  armed  bandit  problem.  The  object  of  this  paper  is 
to  demonstrate  that  techniques  exist  for  dealing  with  some  of  the  technical  prob¬ 
lems  raised  by  these  and  similar  models.  A  few  of  the  insights  derived  from  the 
results  on  the  one  armed  bandit  problem  will  be  described  in  terms  of  nominal 
significance  levels  corresponding  to  the  rejection  of  a  new  drug. 

The  model  is  oversimplified  for  many  practical  applications.  Alternative  mod¬ 
els,  including  a  two  armed  bandit  problem  are  described.  An  important  element 
in  most  of  these  models  is  the  horizon  consisting  of  the  total  number  of  anticipated 
patients  to  be  treated. 

2.  The  model 

Suppose  that  a  new  drug  is  produced  to  treat  an  illness  for  which  the  treatmen  t 
in  the  past  has  been  a  standard  drug  with  known  properties.  We  shall  assume 
here  that  the  result  of  the  use  of  the  drug  can  be  classified  simply  as  a  succes  s 
or  failure  in  the  treatment,  and  once  one  drug  is  applied,  treatment  cannot 
shift  to  the  other.  Then  the  known  drug  is  characterized  by  a  known  probability 
p0  of  success  while  the  new  drug  has  unknown  probability  p  of  success.  If  it  is 
anticipated  that  a  horizon  of  N  patients  will  have  to  be  treated  by  one  drug  or 
another,  the  expected  number  of  successes  given  that  the  new  drug  is  used  n 
times,  is  np  +  (N  —  n)p0  =  Np0  +  n{p  —  p0). 

Clearly,  the  expected  number  of  successes  attains  a  maximum  of  Npo  if  p  <  po 
(with  n  =  0)  and  Np  if  p  >  p0  (with  n  =  N).  In  view  of  the  ignorance  of  p,  it 
is  desired  to  select  a  sequential  procedure  to  maximize  the  expected  number  of 
successes  which  is  equal  to 

(2.1)  Np0  +  E[n(p  -  po)], 

where  n  is  possibly  a  random  quantity  determined  by  the  procedure.  Since  po 
is  known,  it  is  apparent  that  a  reasonable  procedure  ought  to  consist  of  sampling 
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the  new  drug  until  the  data  indicates  it  to  be  inferior,  at  which  time  one  reverts 
to  the  old  drug.  If  the  data  indicates  the  new  drug  to  be  superior  one  never 
reverts  to  the  old  one.  The  reader  should  bear  in  mind  that  the  word  reasonable 
is  applied  in  the  context  of  the  model  which  incidentally  neglects  the  cost  of 
administering  the  new  drug  and  the  possible  need  for  control  groups.  The  prob¬ 
lem  that  remains  is  to  describe  what  constitutes  sufficient  indication  to  stop 
using  the  new  drug.  In  this  situation  a  variety  of  factors  suggests  that  it  is 
appropriate  to  study  the  problem  from  a  Bayesian  point  of  view  especially  when 
N  is  large.  Then  the  problem  of  finding  a  sequential  procedure  to  maximize 

(2.2)  E{n(p-p0)},  n  ^  N, 

where  p  has  a  given  prior  distribution,  is  a  well  determined  optimization  problem. 

Exactly  this  problem  arises  also  in  the  following  rectifying  sampling  inspection 
problem.  A  lot  of  N  items  has  been  produced  by  a  process  such  that  distinct 
items  are  independently  defective  with  unknown  probability  p.  The  cost  of 
inspecting  an  item  is  c.  If  a  defective  item  is  found,  it  is  replaced  from  a  pile 
of  good  items.  The  cost  of  sending  a  customer  a  defective  item  is  k  times  more 
than  the  cost  of  replacing  it.  If  p  <  c/k,  it  would  be  preferable  to  send  out  the 
lot  without  inspection,  while  if  p  >  c/k,  100  per  cent  inspection  is  desirable.  In 
fact  a  sampling  plan  where  n  items  are  inspected  leads  to  an  expected  cost  of 

(2.3)  nc  -f  (N  —  n)pk  =  k  +  n  ^ ^  —  p 

It  is  desired  to  maximize  E{n(p  —  p0)}  where  p0  =  c/k.  It  is  in  this  context 
that  the  problem  was  first  discussed  by  Wurtele  [7],  and  subsequently  by 
Chernoff  and  Ray  [3]. 

A  related  problem  is  the  following.  Let  Xh  X2)  ■  •  • ,  Xn  be  independent  iden¬ 
tically  distributed  random  variables  with  unknown  mean  y  (and  otherwise  known 
distribution).  Given  a  prior  distribution  for  y  select  n  ^  N  sequentially,  so  as 
to  maximize 

(2.4)  E(X  i  +  X2  +  •  •  •  +  Xn)  -  E{ny}. 

Viewing  Xi,  X2,  •  •  •  as  the  winnings  from  a  “one  armed  bandit”  by  a  player 
who  can  stay  at  most  long  enough  to  play  N  games,  we  may  call  this  problem 
a  one  armed  bandit  problem.  The  two  previous  problems  correspond  to  the 
special  case  where  X,  =  1  —  p0  or  —  p0  with  probabilities  p  and  1  —  p,  respec¬ 
tively. 

The  normal  continuous  time  version  of  this  problem  is  particularly  interesting. 
Let  X(t),  representing  the  gambler’s  gain  at  time  t,  be  a  Wiener  process  with 
unknown  drift  y  and  known  variance  a2  per  unit  time.  That  is  to  say,  for  h  <  t2, 
X(t2)  —  X(ti)  is  normally  distributed  with  mean  y(t2  —  U)  and  variance  a2(t2  —  h) 
and  is  independent  of  the  path  X(t)  for  0  ^  t  ^  h.  Then  the  continuous  time 
one  armed  bandit  problem  consists  of  finding  a  sequential  procedure  for  select¬ 
ing  a  stopping  time  T  ^  N,  so  as  to  maximize  E{X{T)}  =  E{Ty)  where  the 


SEQUENTIAL  MODELS  FOR  CLINICAL  TRIALS  807 

unknown  drift  is  assumed  to  have  a  normal  prior  distribution  of  mean  mo  and 
variance  cl. 

To  relate  the  continuous  one  armed  bandit  problem  with  the  discrete  one, 
observe  that  for  integer  values  of  h  and  t2,  X(t2)  —  X(ti)  corresponds  to  the  sum 
of  the  observations  from  tx  +  1  to  t2.  In  our  model  for  clinical  trials,  p  —  po 
corresponds  to  m  while  a 2  could  be  thought  of  being  approximately  p0(l  —  Po)- 
One  anticipates  that  the  solution  for  the  continuous  time  one  armed  bandit  prob¬ 
lem  would  serve  as  a  reasonable  approximation  to  the  discrete  versions,  especially 
if  N  is  large. 

Chernoff  and  Ray  [3]  characterized  certain  asymptotic  properties  of  the  solu¬ 
tion  of  the  continuous  problem  and  indicated  a  rough  approximation  to  the 
optimal  procedure.  More  refined  approximations  can  be  carried  out  by  the  use 
of  a  numerical  computation  involving  backward  induction. 


3.  The  solution 

To  describe  the  solution  of  the  continuous  time  version  of  the  one  armed 
bandit  problem,  consider  first  the  limiting  case  where  the  normal  prior  distribu¬ 
tion  has  variance  cl  =  °°  corresponding  to  what  has  been  termed  vague  prior 
knowledge.  The  asymptotic  results  of  [3]  combined  with  some  freehand  inter¬ 
polation  and  a  backward  induction  suggest  the  approximation  of  the  solution 
presented  in  table  I.  Here  £(t)  represents  the  boundary  of  the  optimal  stopping 


TABLE  I 


Approximation  Solution  of  the 
Continuous  Time  One  Armed  Bandit  Problem 


P  =  nominal  significance  level  = 


(—  u2/2)  du. 


t/N 

a  —  £/ct112 

P 

1.00 

-0.0 

0.50 

0.90 

-0.20 

0.42 

0.75 

-0.36 

0.36 

0.50 

-0.56 

0.29 

0.25 

-0.78 

0.22 

0.10 

-1.08 

0.14 

0.01 

-2.05 

0.02 

10~4 

-3.55 

10~* 

-4.61 

2.10-® 

region.  That  is,  the  optimal  procedure  calls  for  stopping  if  X(t)  ^  £(t).  The 
quantities  a  and  /3  correspond  to  a  nominal  significance  level  as  follows.  Suppose 
that  at  time  t  the  player  stopped  and  decided  to  test  (one  tail)  the  hypothesis 
m  =  0.  The  observation  X(t)  would  correspond  to  a  =  X(t)/ct112  standard  devia¬ 
tions  from  the  mean  0.  Thus,  a  is  the  number  of  standard  deviations  required 
for  the  game  to  be  stopped  at  time  t  and  /3  is  the  corresponding  nominal  signif- 
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icance  level.  From  a  classical  point  of  view  one  can  regard  the  player  as  contin¬ 
uously  testing  the  hypothesis  p  =  0.  As  time  t  varies  from  0  to  N,  the  nominal 
significance  level  becomes  less  stringent  increasing  from  0  to  1/2.  If  and  when 
he  rejects  p  =  0  (in  favor  of  p  <  0),  he  stops  playing.  Although  this  nominal 
significance  level  serves  as  a  convenient  description  of  the  procedure,  its  use 
should  not  be  confused  with  that  of  the  standard  significance  level  which  is  not 
applicable  here. 

Given  X(t )  =  x,  the  posterior  distribution  of  p  is  normal  with  mean  x/t  and 
variance  =  a2/t.  This  fact  permits  us  to  reduce  the  solution  of  the  problem, 
where  the  gambler  with  horizon  N0  has  the  normal  prior  distribution  91  (/x0,  ao), 
to  the  previous  problem  by  simply  initiating  the  Wiener  process  from  the  point 

(3.1)  (xQ,t0)  =  (poa2 / a2,  a2 / a2) 

and  letting 

(3.2)  N  =  No  +  to  =  No  +  a2  j  a  o. 

The  asymptotic  results  of  [3]  indicate  that  as  t/N  — »  0 

(3.3)  0  «  2 t/N 
and 

(3.4)  a  =  «  -  (2  log  (t/N)  -  log  [-16tt  log  (t/N)]} 1/2. 

As  t/N  ->  1, 

(3.5)  0.639  (^1  - 

When  the  horizon  N0  is  large,  the  case  t/N  — » 0  is  especially  important.  Here 
it  has  been  shown  that  the  expected  gain  is  approximated  by 

(3.6)  Aoo-o[<p(«o)  +  ao$(oo)]  —  ^  j^log  ^1  H - ^(«o), 

where  ao  =  moAo,  while  <p(u)  =  (27r)_1/2euJ/2  and  4>(a)  —  f  x  <p(u )  du.  The  first 

and  larger  term  corresponds  to  the  expected  gain  if  m  were  selected  from  the 
normal  distribution  with  mean  n0  and  variance  al  and  immediately  told  to  the 
player  who  would  proceed  to  play  the  entire  allotted  time  N0  if  n  >  0  and 
refuse  to  play  otherwise.  Thus,  the  second  term  represents  the  expected  loss  due 
to  ignorance  of  n  and  is  of  the  order  of  magnitude  of  (log  No)2,  which  increases 
slowly  as  No  becomes  large. 

Now  let  us  relate  the  clinical  trials  problem  posed  in  section  2  to  the  contin¬ 
uous  time  one  armed  bandit  problem  by  assuming  that  the  horizon  No  is  large. 
Assuming  vague  information  about  p,  let 


(3.7) 

t  =  n/N  o 

and 

(3.8) 

nll2(p  —  po) 

a  ~  [po(l  -  Po)]112' 
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where  p  is  the  usual  estimate  of  p  based  on  the  first  n  trials.  Then  one  may 
substitute  directly  in  table  I  to  determine  the  stopping  time.  When  n  is  small 
compared  to  N0,  one  has  the  asymptotic  relation 

(39)  “ “  *  ~{2  log  fir)  - 108  [16t  log  (v)]}"! 

at  the  boundary. 

Consider  the  following  problem.  If  each  trial  of  the  new  drug  leads  to  failure 
how  many  successive  failures  wTould  be  required  before  one  should  stop  the  use 
of  the  new  drug?  Clearly,  the  answer  to  this  question  should  depend  on  N0. 
Substituting  in  the  above  formula,  we  obtain 


(3.10)  n0  «  2  - - 2?  (^g  N0) . 

Po 

This  result  should  not  be  taken  too  seriously  since  it  is  implicitly  based  on  the 
normal  approximation  to  the  distribution  of  p  which  is  not  especially  good  for 
approximating  the  probability  of  n  successive  failures  (1  —  po)n.  There  is  reason 
to  expect  that  the  correct  answer  to  the  question  posed  would  be  between  the 
answer  suggested  above  and 

(3.11)  no  =  (log  No)/ [-log  (1  -  po)]. 

In  any  case  the  order  of  magnitude  log  N0  is  important  as  we  shall  see  in  our 
subsequent  discussion. 

If  the  investigator  has  some  strong  feelings  about  the  unknown  p  which  can 
be  represented  by  a  prior  Beta  distribution  B(a ,  b),  for  which  the  mean  is 
a/ (a  +  b)  and  variance  ab(a  +  b  +  l)/(a  -f  6),  then  the  above  results  are  ap¬ 
plicable  after  replacing  N0  by  N  =  N0  +  (a  +  b),  and  assuming  that  n  =  a  +  b 
fictitious  trials  resulting  in  a  successes  had  taken  place. 

It  is  of  some  interest  to  tabulate  the  estimate  p  of  p  for  which  the  clinical 
investigator  should  stop  the  new  drug.  We  have 

(3.12)  p  =  Po  + 

and  table  II  gives  some  insight  when  we  consider  that  2[p0(l  —  po)]1/2  is  close 
to  1  for  a  broad  range  of  p0-  The  large  entries  corresponding  to  small  n  should 


TABLE  II 

(.Po  -  P)/ 2  [po  (1  -  Po)]1/2  =  -oc/nm 


n\N 

102 

104 

10« 

1 

2.05 

3.55 

4.61 

5 

0.60 

1.36 

1.91 

10 

0.34 

0.91 

1.30 

25 

0.16 

0.52 

0.78 

100 

0 

0.21 

0.36 

1000 

— 

0.03 

0.09 
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not  be  taken  too  seriously  since,  the  normal  approximation  to  the  binomial 
distribution,  on  which  they  are  based,  is  not  very  good. 

4.  Alternative  models 

4.1.  We  shall  discuss  a  variety  of  situations  where  the  model  presented  in 
section  2  for  clinical  trials  requires  some  modification.  First  let  us  consider  the 
case  where  the  two  competing  treatments  being  compared  for  a  horizon  of  N 
patients  are  both  of  unknown  efficacy.  This  corresponds  to  the  two  armed 
bandit  problem  where  the  player  attempts  to  maximize 

(4.1)  E{X1  +  X2+  •••  +  XN}, 

where  Xn  represents  the  value  of  the  outcome  of  the  nth  trial  whose  treatment 
is  determined  by  past  history.  The  continuous  time  normal  version  of  this  prob¬ 
lem  consists  of  maximizing 

(4.2)  E{X1(T1)  +  X2(T2)}  =  E{Tlfx  x  +  7W,  Tt  +  T,  =  N, 

where  Xi  and  X2  are  Wiener  processes  with  unknown  drifts  jui  and  ju2  per  unit 
time  and  known  variance  <r2  per  unit  time.  Both  m  and  /z2  are  assumed  to  have 
normal  independent  prior  distributions.  At  any  given  moment  one  is  entitled 
to  observe  either  X\  or  X2  depending  on  past  history,  7\  is  the  total  time  spent 
observing  X„  and  rJ\  +  T2  =  N. 

While  a  special  version  of  the  two  armed  bandit  problem  was  solved  by 
Feldman  [5],  little  is  known  about  the  solution  of  this  one.  It  is  intuitively  clear 
that  the  solution  will  call  for  using  the  arm  (Wiener  process)  which  has  the 
higher  estimated  mean  until  a  balance  is  struck  between  the  difference  in  the 
estimated  means  and  the  amount  of  information  accumulated  on  each  arm  and 
the  remainder  of  the  horizon.  Thus,  for  a  large  horizon,  the  optimal  procedure 
may  call  for  the  arm  with  the  lower  estimated  mean  drift  if  that  estimate  is 
based  on  a  relatively  small  sample  time. 

4.2.  Suppose  two  unknown  treatments  are  being  compared  in  a  variety  of 
locations,  at  each  one  of  which  the  number  of  patients  available  is  rather  small. 
In  that  case  control  considerations  suggest  the  model  proposed  by  Anscombe 
[1]  and  Colton  [4].  Here,  patients  are  paired  and  one  of  each  pair  is  randomly 
selected  for  one  treatment  while  the  other  is  given  the  second  treatment.  The 
number  of  pairs  to  be  treated  n  is  determined  sequentially  after  which  the 
remainder  of  the  horizon  N  —  2n  are  given  the  treatment  estimated  to  be 
superior. 

If  n  is  the  mean  of  the  difference  X  in  the  treatment  effects,  one  seeks  to 
maximize 

(4.3)  E{Xx  +  X2  +  •  •  •  +  Xn  —  (N  —  2 

where  e(/i)  is  the  probability  of  selecting  the  wrong  treatment,  given  ju.  Colton 
presents  a  detailed  series  of  analyses  of  procedures  which  have  certain  optimality 
properties.  One  shortcoming  is  the  restriction  of  his  sequential  procedures  to 
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those  of  the  Wald  type  which  may  lead  to  rather  poor  results  when  the  horizon 
is  very  large.  In  particular  if  one  treatment  is  substantially  inferior  to  the  other 
and  N  is  very  large,  the  Colton  procedures  call  for  sample  sizes  of  order  N112 
instead  of  the  more  reasonable  log  N. 

Anscombe  derives  an  “outer”  bound  on  the  optimal  Bayes  procedure.  This 
bound  corresponds  to  the  use  of  a  nominal  significance  level  of  n/N.  This  model 
is  subject  to  the  same  technical  approach  as  the  one  armed  bandit  problem  and 
for  n/N  small,  the  nominal  significance  level  should  be  approximately  2 n/N.  In 
the  sense  that  the  difference  in  the  a  corresponding  to  these  significance  levels 
is  small,  the  Anscombe  result  furnishes  a  good  approximation  to  the  optimal 
procedure. 

4.3.  If  the  experimental  situation  calls  for  controls  but  one  treatment  is  well 
known,  our  problem  may  be  regarded  as  maximizing 

(4.4)  E{X \  +  X2  +  •  •  •  +  Xn}  =  EM,  n  ^  N/2, 

where  n  is  the  number  of  pairs  treated  before  returning  to  the  known  treatment. 
Here  p  is  the  mean  of  X,  the  difference  in  the  effect  between  the  new  and  old 
treatments.  This  problem  is  also  a  one  armed  bandit  problem  with  the  horizon 
N  replaced  by  N/2,  the  available  number  of  pairs. 

5.  Discussions  and  more  models 

The  models  discussed  so  far  are  oversimplified  to  say  the  least.  In  practical 
problems  where  specific  shortcomings  of  the  models  are  important,  the  models 
can  be  modified  so  as  to  be  more  meaningful  and  still  capable  of  analysis. 

One  serious  difficulty  is  the  specification  of  the  horizon  N.  Even  if  one  can 
regard  such  a  conception  as  meaningful  because  of  anticipated  changes  in  technol¬ 
ogy  it  is  difficult  to  make  the  choice  of  a  number  to  represent  N.  The  fact  that 
many  of  the  important  decisions  and  losses  involve  N  only  through  log  N  can 
serve  to  give  the  experimenter  the  assurance  that  an  incorrect  specification  of  N 
will  hardly  affect  the  procedures.  It  is  remarkable  how  little  effect  is  due  to 
changing  N  from  106  to  1010. 

Should  one  wish  to  conceive  of  N  as  infinite,  it  is  possible  to  consider  a  model 
where  the  effects  of  future  treatments  are  discounted.  Thus,  one  may  seek  to 
maximize 

(5.1)  E{Xx  +  PX2  +  p2Xz  +  •  •  •  +  Pn~lXn  +•••}, 

where  Xi  is  the  outcome  of  the  tth  treatment  which  can  be  one  of  the  two 
alternatives  and  p  is  the  discount  factor  between  0  and  1. 

The  problem  of  medical  ethics  seems  to  be  unavoidable  in  experiments  involv¬ 
ing  clinical  treatments.  It  is  difficult  to  imagine  a  reasonable  experiment  where 
one  can  be  assured  that  subjects  will  never  be  given  treatments  estimated  to  be 
inferior.  One  should  be  prepared  to  try  a  new  treatment  again  even  if  it  fails 
on  its  first  trial  and  the  skimpy  evidence  is  unfavorable  to  it.  One  approach  to 
quantifying  the  cost  of  treating  a  patient  with  a  drug  currently  estimated  to  be 
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inferior  is  to  add  a  charge,  real  or  imaginary,  proportional  to  p0  —  p  when  a 
patient  is  given  the  treatment  with  estimated  probability  of  success  p  if  p  <  p0. 

I  am  not  well  enough  acquainted  with  the  field  to  present  a  reasonable  discus¬ 
sion  of  the  ethical  problem  and  shall  not  attempt  to  do  so. 

The  Anscombe  and  Colton  model  represents  a  simplification  of  one  presented 
previously  by  Maurice  [6].  This  more  complicated  model  also  had  a  cost  for 
experimentation  on  the  first  n  pairs  of  patients.  The  Anscombe  and  Colton 
model  neglects  the  cost  of  experimentation.  The  opposite  extreme  version  of  the 
Maurice  model  would  be  to  summarize  the  effect  of  giving  the  entire  horizon 
the  wrong  treatment  by  a  multiple  of  the  mean  difference.  This  extreme  leads 
simply  to  the  more  standard  problem  of  sequentially  testing  whether  the  un¬ 
known  mean  is  positive  or  negative. 

A  variety  of  procedures  and  models  are  presented  by  Armitage  [2],  While  the 
procedures  are  not  optimal  Bayes  procedures,  the  general  impression  one  receives 
is  that  they  are  rather  efficient  for  situations  where  sample  sizes  are  expected 
to  be  moderate,  but  that  there  is  substantial  loss  when  very  large  samples  are 
anticipated.  Considerable  work  remains  to  be  done  in  analyzing  various  models 
and  comparing  optimal  with  standard  procedures. 

As  a  final  remark,  I  would  like  to  add  that  one  sometimes  encounters  a  naive 
conception  that  situations  where  control  is  desired  require  pairing  of  treatments. 
In  many  instances  control  may  be  achieved  when  two  treatments  are  given  in 
ratios  of  1  to  2,  or  1  to  3,  and  so  forth.  Not  only  can  such  ratios  give  more 
efficient  results  but  it  is  quite  likely  that  ethical  requirements  would  also  point 
in  this  direction. 
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